Utilizing VitalDBR to determine the pulse pressure variation’s influence on mortality of open surgery patients

Introduction / Motivation

This vignette will show why VitalDBR is useful, if you want to do statistical analysis on data from VitalDB. The vignette is split into different parts. First of all we want to prepare the data for analysis. The data we are interested in is the Arterial Blood Pressure (ART) and the Airway Pressure (AWP). We subset filter and detect events for this data through signal processing. After that we show how to use the processed data and detected events to fit a Generalized Additive Model (GAM) and extract the pulse pressure variation. In the end we extend the use case to a novel investigation into the importance of the Pulse Pressure Variation (PPV) on whether patient’s time spent in the ICU (intensive care unit). To do this we fit a GAM for every other minute during operation and see the proportion of those outside the interval $ 5 <= PPV <= 8$. When this dataset is built, we use the results to do a poisson regression regression with the icu-days as the dependent variable. Afterwards we compare the simple poisson regression to a zero-inflated poisson regression and a pseudo compound poisson regression. We of course include other important features like age and BMI, and finally we discuss what confounding variables would have been relevant to include.

Fitting a gam

Preparing everything

Before we get started, we need a couple of packages.

<<<<<<< HEAD =======
# This is how you install VitalDBR!
library(devtools)
install_github('legendenomgeorg/VitalDBR/VitalDBR')
library(VitalDBR)
library(waveformtools) # This is a package made by our advisor PhD. student Johannes
library("tidyverse")
library(mgcv) # The standard R package to use GAM's
library(patchwork) #package used for plotting
>>>>>>> fix_stats

Then we need to load some data. For that we have created the function “load_case”, which takes in a monitoring machine, the track of interest and finally the caseid.

data_art <- VitalDBR::load_case('SNUADC/ART', 1)
data_awp <- VitalDBR::load_case('Primus/AWP', 1)

In this case we load the tracks ART (Arterial Blood Pressure) and AWP (Airway Pressure) for patient 1. ART is measured by the “SNUADC” machine and AWP is measured by the “Primus” machine. Under the hood this function finds all tracks with the inputted caseid and then stiches together the right API call such that we get the the right data. The format of the data also makes it easy to convert to a timeseries format as the frequency is saved in the first row of the first column, like this:

tsp(ts(data_art[,2], frequency = 1/data_art[1,1])) # Output is first value, last value and frequency
## [1]     1.00 11541.31   500.00

But we will continue working in the data.frame format for this vignette.

To make sure that we work with the same data through out the entire vignette, we start out by defining an interval we want to work on

start = 10000 # Interval starts at 10000 seconds
sec = 30 # Last 30 seconds from 10000. Such that the interval is now 10000 sec to 10030 sec

Finding the inspiration of AWP

Now we introduce our function: “subset_data”. This function helps with choosing an interval of a timeseries defined by frequency. It automatically sets converts frequency to seconds and subsets the data. The reason the time variable is not converted to the row index, is that it makes it difficult to work with.

<<<<<<< HEAD =======
sub_awp <- VitalDBR::subset_data(data = data_awp, seconds = sec, start_sec = start)
knitr::kable(head(sub_awp))
>>>>>>> fix_stats
Time Primus.AWP
10000.00 7.3664
10000.02 7.3664
10000.03 7.3664
10000.05 7.3664
10000.06 7.3664
10000.08 7.3664

We now have the right subset of our AWP data, and we can simply plot it like so:

<<<<<<< HEAD

Now, as stated in the introduction, we are interested in finding the inspiration of the airway pressure, which is the beginning of a breath. For that we have created the function “get_inspiration_start”. By default, it works by applying a convolution filter of the form (-1,-1,-1,-1,-1,-1,-1,-1,0,1,1,1,1,1,1,1,1) (note: stats::filter takes the convolution filter in reverse, so we input the reverse of this). This creates a waveform which peaks at sudden large changes in amplitude. This means that we can use a standard peak-finding algorithm on the convolution, to find the inspirations.

=======
ggplot(sub_awp, aes(Time, Primus.AWP)) +
geom_line()

Now, as stated in the introduction, we are interested in finding the inspiration of the airway pressure, which is the beginning of a breath. For that we have created the function “get_inspiration_start”. It works by applying a convolution filter of the form (-1,-1,-1,-1,-1,-1,-1,-1,0,1,1,1,1,1,1,1,1) (note: stats::filter takes the convolution filter in reverse, so we input the reverse of this). This creates a waveform which peaks at sudden large changes in amplitude. This means that we can use a standard peak-finding algorithm on the convolution, to find the inspirations.

insp_start <- VitalDBR::get_inspiration_start(sub_awp)
knitr::kable(head(insp_start))
>>>>>>> fix_stats
time
10001.65
10005.94
10010.22
10014.51
10018.78
10023.09

The function returns the times of the peaks in a dataframe And we can then plot the inspirations on the AWP signal:

<<<<<<< HEAD

=======
ggplot(sub_awp, aes(Time, Primus.AWP)) +
geom_line() +
geom_vline(aes(xintercept = time), color = "red",
data = insp_start)

>>>>>>> fix_stats

Finding the diastolic and systolic peaks of the Arterial Blood Pressure (ART)

To find the diastolic and systolic peaks of the Arterial Blood Pressure (ART) we first subset the data, to the same interval as before. But the noise in the ART data can disturb our ability to find the peaks properly. Luckily you can give “subset_data” an optional argument called “filter”. If the filter argument is set to TRUE, it applies a Butterworth filter to the data. The result can be seen if you zoom into the plot below (interactively!). We can also set the cutoff frequency for the filter with the argument “cut_freq” (default is 25).

<<<<<<< HEAD
Time SNUADC.ART SNUADC.ART_filt
10000.00 103.760 58.23638
10000.00 104.748 69.50366
10000.00 104.748 79.71424
10000.01 105.735 88.46599
10000.01 105.735 95.61850
10000.01 105.735 101.21007

Now that the data is prepared we use a function created by our advisor Johannes, thats finds the value and position of the peaks. It also finds the pulse pressure, which is just the difference between the systolic and diastolic blood pressure: $ PP = SBP - DBP $

## Sample rate not set. Returning beat length in samples
=======
sub_art <- VitalDBR::subset_data(data = data_art, seconds = sec, start_sec = start, filter=TRUE, cut_freq = 25)
dygraph_signal(sub_art, 2, 3)

Now that the data is prepared we use a function created by our advisor Johannes, thats finds the value and position of the peaks. It also finds the pulse pressure, which is just the difference between the systolic and diastolic blood pressure: $ PP = SBP - DBP $

beats <- waveformtools::find_abp_beats(sub_art,abp_col=3,time_col=1)[-1,] # we skip the first observation
## Sample rate not set. Returning beat length in samples
knitr::kable(head(beats))
>>>>>>> fix_stats
time dia sys PP beat_len time_systole dia_pos sys_pos .noise_wiggliness .noise_pos_after_sys
10001.33 51.31872 110.6373 59.31856 365 10001.51 667 754 0.1605111 NA
10002.06 50.11974 108.6104 58.49065 363 10002.23 1032 1115 0.1429079 NA
10002.79 49.85253 108.3100 58.45746 362 10002.97 1395 1484 0.1512188 NA
10003.51 49.73168 109.7397 60.00801 362 10003.69 1757 1845 0.1516818 NA
10004.24 51.22496 111.3532 60.12822 366 10004.41 2119 2208 0.1582818 NA
10004.97 52.26622 112.2327 59.96645 365 10005.14 2485 2573 0.1810274 NA

Now we plot the Arterial blood pressure, together with the peaks and the inspiration:

<<<<<<< HEAD

Beautiful, right?

We can also plot the pulse pressure with the inspiration. NOTE: As the pulse pressure is a discrete measurement, this is plotted with a linear line interpolation.

But we want to know which inspiration every pulse pressure measurement belongs to, and we want to know the position of the pulse pressure measurement relative to the inspiration. To do this we use a function created by our advisor Johannes, that adds “ann_n” which says which inspiration a measurement belongs to. And “ann_rel_index” which gives the position relative to the inspiration (ie. it goes from 0 to 1 between each inspiration)

=======
ggplot(sub_art, aes(Time, SNUADC.ART)) +
geom_line() +
geom_vline(aes(xintercept = time), color = "red",
data = insp_start)+
geom_point(aes(x = time,
    y = dia,
    colour = "diastole"),
    data = beats) +
  geom_point(aes(x = time_systole,
    y = sys,
    colour = "systole"),
    data = beats)

Beautiful, right?

We can also plot the pulse pressure with the inspiration. NOTE: As the pulse pressure is a discrete measurement, this is plotted with a linear line interpolation.

pp_plot <- ggplot(beats, aes(time, PP)) +
  geom_line() +
  geom_point() +
  geom_vline(aes(xintercept = time), color = "red",
  data = insp_start)
pp_plot

But we want to know which inspiration every pulse pressure measurement belongs to, and we want to know the position of the pulse pressure measurement relative to the inspiration. To do this we use a function created by our advisor Johannes, that adds “ann_n” which says which inspiration a measurement belongs to. And “ann_rel_index” which gives the position relative to the inspiration (ie. it goes from 0 to 1 between each inspiration)

beats_indexed <- waveformtools::add_time_since_event(beats, time_event = insp_start$time)
knitr::kable(head(beats_indexed))
>>>>>>> fix_stats
time dia sys PP beat_len time_systole dia_pos sys_pos .noise_wiggliness .noise_pos_after_sys ann_index ann_n ann_cycle_len ann_rel_index
10001.33 51.31872 110.6373 59.31856 365 10001.51 667 754 0.1605111 NA NA NA NA NA
10002.06 50.11974 108.6104 58.49065 363 10002.23 1032 1115 0.1429079 NA 0.414 1 4.288 0.0965485
10002.79 49.85253 108.3100 58.45746 362 10002.97 1395 1484 0.1512188 NA 1.140 1 4.288 0.2658582
10003.51 49.73168 109.7397 60.00801 362 10003.69 1757 1845 0.1516818 NA 1.864 1 4.288 0.4347015
10004.24 51.22496 111.3532 60.12822 366 10004.41 2119 2208 0.1582818 NA 2.588 1 4.288 0.6035448
10004.97 52.26622 112.2327 59.96645 365 10005.14 2485 2573 0.1810274 NA 3.320 1 4.288 0.7742537

This plot then colors the pulse pressure measurements according to which inspiration they belong to. On the right plot we can also start to see why this is important. It seems that there is a relationship between inspiration and pulse pressure. This is the relationship we want to model with the GAM framework.

<<<<<<< HEAD

## Warning: Removed 5 row(s) containing missing values (geom_path).
## Warning: Removed 5 rows containing missing values (geom_point).

=======
pp_plot_color <- ggplot(beats_indexed, aes(time, PP)) +
geom_line() +
# insp_n is a unique (consecutive) number for each respiratory cycle
geom_point(aes(color = as.factor(ann_n)), show.legend = FALSE) +
geom_vline(aes(xintercept = time), color = "red",
data = insp_start) +
labs(title = "Pulse pressure",
subtitle = "by time. Color indicate respiratory cycle")
pp_insp_plot <- ggplot(beats_indexed,
aes(
ann_rel_index,
PP,
group = as.factor(ann_n),
color = as.factor(ann_n)
)
) +
geom_line(alpha = 0.3, show.legend = FALSE) +
# insp_n is a unique (consecutive) number for each respiratory cycle
geom_point(aes(color = as.factor(ann_n)), show.legend = FALSE) +
labs(subtitle = "by position in the respiratory cycle")
pp_plot_color + pp_insp_plot

>>>>>>> fix_stats

Fitting the GAM

We can generally write the GAM structure as: \[g(E(Y))=α+s1(x1)+⋯+sp(xp)\] In our case this would translate to: \[g(E(PP))=α+s1(ann\_rel\_index)+⋯+sp(Time)\]

THIS NEEDS TO BE FIXED :D

<<<<<<< HEAD

We need the last part of the right hand side, which is the intercept ( or is it the mean? or is that the same?):

=======
PP_data <- beats_indexed[,c("PP","time","ann_rel_index")]

PP_gam <- gam(
# The first parameter to the gam() function is the model specification,
# supplied using formula notation:
PP ~ # Left of the tilde (~) is our dependent variable PP
# Right of the tilde is our independent variables.
# Define a smooth function of insp_rel_index.
s(ann_rel_index,
k =10, # 10 knots.
bs = "cc" # The basis is a cyclic cubic spline
) +
# Define a smooth function of time
s(time,
bs = "cr" # The basis is a natural cubic spline.
# default k is 10. This will be fine here.
),
# We can specify the positions of the knots for each smooth.
# If only two knots are specified for a cyclic spline, these will
# set the positions of the limiting knot(s). The remaining knots will
# be positioned automatically (at quantiles).
knots = list(ann_rel_index = c(0,1)),
# We use restricted maximum likelihood (REML) to fit the optimal smoothing parameter.
# This is often the best choice, but not the default.
method = "REML",
data = PP_data
)

gratia::draw(PP_gam,
residuals = TRUE)

We need the last part of the right hand side, which is the intercept ( or is it the mean? or is that the same?):

alpha <- coef(PP_gam)[1] # intercept
alpha
>>>>>>> fix_stats
## (Intercept) 
##    59.64833

Now we can construct the expected value of Y, the pulse pressure, through adding: Alpha, s(ann_rel_index) and s(time)

Time to calculate PPV

<<<<<<< HEAD =======
calc_PPV <- function(smooth, intercept) {
min_PP <- min(smooth)
max_PP <- max(smooth)
(max_PP - min_PP) / unname(intercept)
}
splines <- predict(PP_gam, type = "terms")
PPV <- calc_PPV(splines[,1], intercept = coef(PP_gam)[1])
PPV
>>>>>>> fix_stats
## [1] 0.05500996

Association study - does PPV influence survival of patients?

This part of the vignette revolves around utilizing what we have previously done, to make a large scale association study to understand whether the Pulse Pressure Variation (PPV) during an operation influences the outcome.

Creating the dataset

To create the data set we need to calculate the PPV for every other minute of every operation… That is a lot of calculation! Therefore we have added the final dataset to the repo, such that you do not have to spend sleepless nights hoping that R doesn’t crash during it’s 1147th iteration. But before we can calculate the PPV we need to restrict our dataset to the following criteria: - The surgery approach should be “open” - The department should be “General surgery” - The type of anasthesia should be “General” (These restrictions were decided by our advisers) We do that as such:

<<<<<<< HEAD =======
cases <- VitalDBR::load_VDB("https://api.vitaldb.net/cases") %>%
      dplyr::filter(approach=="Open",
                department=="General surgery",
                ane_type=="General") %>%
      dplyr::select(caseid, death_inhosp, icu_days, age, sex, asa, emop, bmi, opstart, opend)
knitr::kable(head(cases))
>>>>>>> fix_stats
caseid death_inhosp icu_days age sex asa emop bmi opstart opend
1 0 0 77 M 2 0 26.3 1668 10368
2 0 0 54 M 2 0 19.6 1721 14621
5 0 13 66 M 3 1 20.4 2591 20291
8 0 0 81 F 2 0 27.4 752 4952
12 0 4 46 F 2 0 28.4 5360 30860
16 0 0 57 M 2 0 26.2 2575 12175

As we know from the first part of this vignette, we need both the Arterial Blood Pressure (ART) and the Airway Pressure (AWP) to calculate the GAM. Therefore we also restrict our data to cases which have both sensors like so:

<<<<<<< HEAD =======
tracks <- VitalDBR::load_VDB("https://api.vitaldb.net/trks") %>%
    dplyr::filter(tname == "Primus/AWP" | tname == "SNUADC/ART") %>%
    count(caseid) %>%
    dplyr::filter(n == 2)

knitr::kable(head(tracks))
>>>>>>> fix_stats
caseid n
1 2
3 2
4 2
7 2
10 2
12 2

We then join those two datasets together by their case id, which results in the final dataset:

<<<<<<< HEAD =======
merged <- merge(x=tracks,y=cases,by="caseid") %>% dplyr::select(-one_of("n"))

knitr::kable(head(merged))
>>>>>>> fix_stats
caseid death_inhosp icu_days age sex asa emop bmi opstart opend
1 0 0 77 M 2 0 26.3 1668 10368
12 0 4 46 F 2 0 28.4 5360 30860
16 0 0 57 M 2 0 26.2 2575 12175
17 0 1 85 M 2 0 19.7 3540 19140
19 0 2 74 M 2 0 22.6 2184 26484
25 0 1 66 M 2 0 27.8 1882 13882

And now for the part where we combine everything we have done so far to create the dataset we need to be able to do statistical analysis on the effects of the PPV on patients.

INDSÆT OVERORDNET FOKLARING AF HVAD FUNKTIONEN GØR

<<<<<<< HEAD
calc_PPV <- function(smooth, intercept) {
min_PP <- min(smooth)
max_PP <- max(smooth)
(max_PP - min_PP) / unname(intercept)
}
ppv_prepare <- function(case, start, end, data_art, data_awp){
  # det tager usandsynligt lang tid, så vi skal have fixet den load funktion....
  # men ellers er det bare at implemetere PPV udregningen her
  op <- end - start
  interval <- 120
  iterations <- floor(op/interval)
  data <- c(matrix(NA, nrow=iterations))
  skip_to_next <- FALSE
  for (i in 0:iterations){
        try_catch <- tryCatch(
    {
    sub_awp <- VitalDBR::subset_data(data = data_awp, seconds = interval, start_sec = start+(interval*i))
    insp_start <- VitalDBR::get_inspiration_start(sub_awp)
    sub_art <- VitalDBR::subset_data(data = data_art, seconds = interval, start_sec = start+(interval*i), filter=TRUE, cut_freq = 25)
    beats <- find_abp_beats(sub_art, abp_col=3, time_col=1)
    beats_indexed <- waveformtools::add_time_since_event(beats, time_event = insp_start$time)
    rm(beats)
    PP_data <- beats_indexed[,c("PP","time","ann_rel_index")]
    PP_gam <- gam(
    PP ~ 
    s(ann_rel_index,k = 10, bs = "cc" ) + s(time, k = 10, bs = "cr"), # splines
    knots = list(ann_rel_index = c(0,1)), method = "REML", data = PP_data )
    splines <- predict(PP_gam, type = "terms")
    data[i] <- calc_PPV(splines[,1], intercept = coef(PP_gam)[1])*100
    cat("Iteration",i, "out of ", iterations,". For case:",case,"\n")
    rm(PP_data)
    },
    error = function(e){
      skip_to_next <<- TRUE
    }
    )
    if(skip_to_next) { 
      data[i] <- NA
      next 
      } 
  }
  return(data)
}

process_cases <- function(data){
  ppv_under5 <- data.frame(matrix(NA, nrow = nrow(data)))
  ppv_over8 <- data.frame(matrix(NA, nrow = nrow(data)))
  counter <- 0 
  break_and_save <- FALSE
  for (caseid in data$caseid){
    counter <- counter + 1 
    closeAllConnections()
            try_catch <- tryCatch(
    {
    cat("Importing ART for case:",caseid,"\n")
    art <- VitalDBR::load_case('SNUADC/ART', caseid)
    cat("Importing AWP for case:",caseid,"\n")
    awp <- VitalDBR::load_case('Primus/AWP', caseid) 
    },
    error = function(e){
      break_and_save <<- TRUE
      })
            
    if(break_and_save) {
      cat("Something went wrong when loading case:", caseid,"Saving results so far", "\n" )
      break 
      } 
    start <- data$opstart[data$caseid==caseid]
    end <- data$opend[data$caseid==caseid]
    ppv_results <- na.omit(ppv_prepare(caseid, start, end, art,awp))
    rm(art)
    rm(awp)
    len_ppv <- length(ppv_results)
    ppv_under5[counter, 1] <- sum(ppv_results<=5)/len_ppv
    ppv_over8[counter, 1] <- sum(ppv_results>=8)/len_ppv
  }
  colnames(ppv_under5) <- c('ppv_under5')
  colnames(ppv_over8) <- c('ppv_over8')
  data <- cbind(data, ppv_under5)
  data <- cbind(data, ppv_over8)
  return(data)
}

#ppv_data <- process_cases(merged)
=======
calc_PPV <- function(smooth, intercept) {
min_PP <- min(smooth)
max_PP <- max(smooth)
(max_PP - min_PP) / unname(intercept)
}
ppv_prepare <- function(case, start, end, data_art, data_awp){
  # det tager usandsynligt lang tid, så vi skal have fixet den load funktion....
  # men ellers er det bare at implemetere PPV udregningen her
  op <- end - start
  interval <- 120
  iterations <- floor(op/interval)
  data <- c(matrix(NA, nrow=iterations))
  skip_to_next <- FALSE
  for (i in 0:iterations){
        try_catch <- tryCatch(
    {
    sub_awp <- VitalDBR::subset_data(data = data_awp, seconds = interval, start_sec = start+(interval*i))
    insp_start <- VitalDBR::get_inspiration_start(sub_awp)
    sub_art <- VitalDBR::subset_data(data = data_art, seconds = interval, start_sec = start+(interval*i), filter=TRUE, cut_freq = 25)
    beats <- find_abp_beats(sub_art, abp_col=3, time_col=1)
    beats_indexed <- waveformtools::add_time_since_event(beats, time_event = insp_start$time)
    rm(beats)
    PP_data <- beats_indexed[,c("PP","time","ann_rel_index")]
    PP_gam <- gam(
    PP ~ 
    s(ann_rel_index,k = 10, bs = "cc" ) + s(time, k = 10, bs = "cr"), # splines
    knots = list(ann_rel_index = c(0,1)), method = "REML", data = PP_data )
    splines <- predict(PP_gam, type = "terms")
    data[i] <- calc_PPV(splines[,1], intercept = coef(PP_gam)[1])*100
    cat("Iteration",i, "out of ", iterations,". For case:",case,"\n")
    rm(PP_data)
    },
    error = function(e){
      skip_to_next <<- TRUE
    }
    )
    if(skip_to_next) { 
      data[i] <- NA
      next 
      } 
  }
  return(data)
}

process_cases <- function(data){
  ppv_under5 <- data.frame(matrix(NA, nrow = nrow(data)))
  ppv_over8 <- data.frame(matrix(NA, nrow = nrow(data)))
  counter <- 0 
  break_and_save <- FALSE
  for (caseid in data$caseid){
    counter <- counter + 1 
    closeAllConnections()
            try_catch <- tryCatch(
    {
    cat("Importing ART for case:",caseid,"\n")
    art <- VitalDBR::load_case('SNUADC/ART', caseid)
    cat("Importing AWP for case:",caseid,"\n")
    awp <- VitalDBR::load_case('Primus/AWP', caseid) 
    },
    error = function(e){
      break_and_save <<- TRUE
      })
            
    if(break_and_save) {
      cat("Something went wrong when loading case:", caseid,"Saving results so far", "\n" )
      break 
      } 
    start <- data$opstart[data$caseid==caseid]
    end <- data$opend[data$caseid==caseid]
    ppv_results <- na.omit(ppv_prepare(caseid, start, end, art,awp))
    rm(art)
    rm(awp)
    len_ppv <- length(ppv_results)
    ppv_under5[counter, 1] <- sum(ppv_results<=5)/len_ppv
    ppv_over8[counter, 1] <- sum(ppv_results>=8)/len_ppv
  }
  colnames(ppv_under5) <- c('ppv_under5')
  colnames(ppv_over8) <- c('ppv_over8')
  data <- cbind(data, ppv_under5)
  data <- cbind(data, ppv_over8)
  return(data)
}

#ppv_data <- process_cases(merged)
>>>>>>> fix_stats

Notes:

  • This is a very computationally intensive function. It took almost 16 hours to caculate the PPV aggregates for all 1360 operations. Running a profiler revealed that the function “find_abp_beats” was the main culprit, and that over 80% of the time was spent in that function. Future work would preferably optimize that function. From our first iteration of the function to the end result we decreased the amount of RAM and cpu needed substantially, both by utilizing R’s vectorization, and also managing our RAM during the algorithm, ensuring that we remove variables from RAM manually, as soon as they were not needed anymore.

  • A catch-it-all approach was used two places in this function. Due to data-irregularity of some of the cases, we decided to put all calculations in one big try-except clause. We initially spent a long time trying to account for all edge cases, but in the end we decided that if an interval of the timeseries was of very bad data-quality, we would rather set it to NA, than keep trying to adapt our function around it. This was done as a result of a discussion with our advisors, who suggested, that we are not at all interested in data, if it is of such bad that no “organic process” can have caused the quality decline.

Regression analysis

Now that we have created our dataset, it

TO DO

  • Vi skrev I vores projekt beskrivelse at vi ville tilføje funtioner til pakken der visualiserede gams. Det skal vi nok lige have gjort.
  • Vi skal nok i store træk have forklaret hvad den store funktion gør. Dette kunne passende være jer andre der skrev det, så I bedre forstår den.

Udforsk betydningen af outcomes NOTER:

Det vi gerne vil nu, er at undersøge betydningen af pulsstrykvariationen på patienters 30 dages mortalitet. Derfor skal vi lave en logistisk regression der ser sådan her ud \[ mortality_i \sim PPV\_thresh_i,art\_mbp_i, Age_i, Sex_i, Asa_i, Eme\_ope_i, BMI_i\] Derfor skal vi bruge et datasæt der ser sådan her ud:

<<<<<<< HEAD =======
Case <- c(29,600,1009,1900)
mortality <- c(0,1,1,0)
PPV_thresh <- c(8,9,6,10)
ART_MBP <- c(54,65,69,70)
Age <- c(22,47,87,4)
Sex <- c(1,0,0,1)
Asa <- c(6,4,1,4)
Eme_ope <- c(1,0,0,1) #emergency operation
BMI <- c(32,16,29,15)
data.frame(Case, mortality,PPV_thresh, ART_MBP, Age, Sex, Asa, Eme_ope, BMI)
>>>>>>> fix_stats
##   Case mortality PPV_thresh ART_MBP Age Sex Asa Eme_ope BMI
## 1   29         0          8      54  22   1   6       1  32
## 2  600         1          9      65  47   0   4       0  16
## 3 1009         1          6      69  87   0   1       0  29
## 4 1900         0         10      70   4   1   4       1  15

Forklaringer: Case: - Case skal sorteres således at vi kun finder de operationer der er af typen: “open”, “general surgery” og “general Anathesia”. Dette kræver en udvidet load funktion. Der burde være lidt under 3000 observationer

PPV_thresh: - Hvis det så er 180 minutter skal vi fitte 180 gams og udtrække de 180 pulstryk variationer (som er den der amplitude/gennemsnit for pulstryk). Vi vil gerne vide hvor meget af tiden PPV ligger uden for intervallet (6,12) (domain knowledge). Der er lidt forskellige måder dette kan laves på. Men det skal ende ud i enten et samlet tal, eller måske to kolonner med andel af PPV_high, PPV_low. - Vi kan dnytte “anestart” og “aneend” fra clinical information API’en til kun at vælge det interval af operationen hvor patienten er i narkose.

ART_MBP (lav prioritet) : - ART_MBP (mean arterial pressure) er faktisk en del af SOLAR8000, så måske skal vi også begrænse vores cases til det? - Vi er intereserede i hvor ofte ART_MBP ligger under 65 (domain knowledge), så vi skal bare finde andelen af værdierne i ART_MBP under 65

Age, sex, asa, Eme_ops og BMI: - Disse punkter er bare nogle confounders vi skal have med, de ligger i clinical information API’en

Konkrete opgaver (prioriteret): 1. Opgrader load funktion så den kan loade ud fra kriterier i clinical information API - Høj prioritet. Tror jeg (Georg) tager den da jeg har skrevet det meste af load funktionaliteten.

2.Lav en funktion der tager et langt x antal minutter og outputter x PPV-værdier - Høj prioritet

  1. Lav funktion, der får et case ID og henter: Age, sex, asa, Eme_ops og BMI
  • (Mellem prioritet) Dette kræver ikke at opgave 1 er løst, så den kan en af jer bare tage
  1. Udvid subset funktionen så man kan passe at den kun skal kigge på intervallet mellem “anestart” og “aneend”. ’
  • (lav prioritet) Kræver måske at opgave 1 er løst, men ikke sikkert

Generelt til udvikling: - Fra nu af uploader vi ikke flere funktioner til pakken, men vi holder dem i de dokumenter vi skriver i, så samler vi til sidst, så slipper vi for konstant at skulle pushe til master - Ville være bedst hvis vi stoppede med at pushe til master, men istedet havde hver vores branch vi arbejdede i og så mergede. - Lad os holde vignetten som den er nu, og så ellers lave et dokument for hver funktion man arbejder på, hvor man kopierer de ting man skal bruge fra vignetten ind.

Spørgsmål: - Hvis vi skal bruge ART_MBP skal vi vel også restricte vores cases til dem der er optaget på SOLAR8000? - Skal vi definerer PPV udfra et tal eller to tal som beskrevet i beskrivelsen af PPV_threshx